Architecture independent short vector FFTs

نویسندگان

  • Franz Franchetti
  • Herbert Karner
  • Stefan Kral
  • Christoph W. Ueberhuber
چکیده

This paper introduces a SIMD vectorization for FFTW—the “fastest Fourier transform in the west” by Matteo Frigo and Steven Johnson. The new method leads to an architecture independent short vector SIMD FFT vectorization that utilizes the architecture adaptivity of FFTW. It is based on special FFT kernels (up to size 64 and more) that are utilized by FFTW to compute the whole transform. This vectorization supports all features of complex transforms in FFTW (arbitrary size, dimension and stride of the data vector; in-place and out-of-place transforms) and is fully transparent to the user. It is suitable for arbitrary vector sizes of the underlying hardware.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic generation of prime length FFT programs

We describe a set of programs for circular convolution and prime length FFTs that are relatively short, possess great structure, share many computational procedures, and cover a large variety of lengths. The programs make clear the structure of the algorithms and clearly enumerate independent computational branches that can be performed in parallel. Moreover, each of these independent operation...

متن کامل

Efficient FFTs on IRAM

Computing Fast Fourier Transforms (FFTs) is notoriously difficult on conventional general-purpose architectures because FFTs require high memory bandwidth and strided memory accesses. Since FFTs are important in signal processing, several DSPs have hardware support for performing FFTs; moreover, some DSPs are designed solely for the purpose of computing FFTs and related transforms. In this pape...

متن کامل

Fast Fourier Transform BYLINE

A fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) of an input vector. Efficient means that the FFT computes the DFT of an n-element vector in O(n logn) operations in contrast to the O(n2) operations required for computing the DFT by definition. FFTs exist for any vector length n and for real and higher-dimensional data. Parallel FFTs have b...

متن کامل

Multiprocessor FFTs

Several multiprocessor FFTs are developed in this paper for both vector multiprocessors with shared memory and the hypercube. Two FFTs for vector multiprocessors are given that compute an ordered transform and have a stride of one except for a single "link" step. Since multiple FFTs provide additional options for both vectorization and distribution we show that a single FFT can be performed in ...

متن کامل

An Abstraction Layer for SIMD Extensions

This paper presents an abstraction layer for short vector SIMD ISA extensions like Intel’s SSE, AMD’s 3DNow!, Motorola’s AltiVec, and IBM’s Double Hummer. It provides unified access to short vector instructions via intermediate level building blocks. These primitives are C macros that allow, for instance, portable and highly efficient implementations of discrete linear transforms like FFTs and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001